Efficient Mining of High Utility Sequential Patterns Over Data Streams
نویسندگان
چکیده
High utility sequential pattern mining has emerged as an important topic in data mining. Although several preliminary works have been conducted on this topic, the existing studies mainly focus on mining high utility sequential patterns (HUSPs) in static databases and do not consider the streaming data. Mining HUSPs over data streams is very desirable for many applications. However, addressing this topic is not an easy task. First, streaming data come continuously in high speed and the mining result should be instantly available when users request it. Second, we need to overcome the problem of combinatorial explosion of a large search space. Third, pruning search space for HUSP mining is difficult because the downward closure property does not hold for the utility of sequences. In this paper, we propose a new framework for mining high utility sequential patterns over data streams, which has not been explored previously. A novel data structure named HUSP-Tree is proposed to maintain the essential information for mining HUSPs. HUSP-Tree can be easily updated when new data arrive and old data expire in a data stream. An efficient and single-pass algorithm named HUSPStream is proposed to generate HUSPs from HUSP-Tree. When data arrive at or leave from a sliding window, HUSP-Stream incrementally updates HUSP-Tree online to find HUSPs based on previous mining results. HUSP-Stream uses a new utility estimation model to more effectively prune the search space. Experimental results on real and synthetic datasets show that our algorithm outperforms the state-of-the-art algorithms and serves as an efficient solution to the new problem of mining high utility sequential patterns over data streams.
منابع مشابه
Efficiently Mining High Utility Sequential Patterns in Static and Streaming Data
High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come contin...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملMemory-Bounded High Utility Sequential Pattern Mining over Data Streams
Mining high utility sequential patterns (HUSPs) has emerged as an important topic in data mining. However, the existing studies on this topic focus on static data and do not consider streaming data. Streaming data are fast changing, continuously generated and unbounded in amount. Such data can easily exhaust computer resources (e.g., memory) unless proper resource-aware mining is performed. In ...
متن کاملA Single-scan Algorithm for Mining Sequential Patterns from Data Streams
Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more ...
متن کاملEfficient Mining of High Utility Patterns over Data Streams with a Sliding Window Method
High utility pattern (HUP) mining over data streams has become a challenging research issue in data mining. The existing sliding window-based HUP mining algorithms over stream data suffer from the level-wise candidate generationand-test problem. Therefore, they need a large amount of execution time and memory. Moreover, their data structures are not suitable for interactive mining. To solve the...
متن کامل